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METHODS FOR OPERATING MASS SPECTROMETRY (MS) INSTRUMENT 

SYSTEMS 



BACKGROUND OF THE INVENTION 



FIELD OF THE INVENTION 

The present invention generally relates to the field of Mass Spectrometry (MS) and, more 
particularly, to methods for calibrating MS instruments systems and for processing MS data. 



BACKGROUND OF THE INVENTION 

Mass Spectrometry (MS) is a 100-year old technology that relies on the ionization and 
fi-agmentation of molecules, the dispersion of the fragment ions by their masses, and the proper 
detection of the ion fi-agments on the appropriate detectors. There are many ways to achieve 

15 each of these three key MS processes which give rise to different types of MS instrumentations 
having distinct characteristics. 

Four major types of ionization techniques are commonly used to both break apart a larger 
molecule into many smaller molecules and at the same time ionize them so that they can be 
properly charged before mass dispersion. These ionization schemes include Electrospray 

20 Ionization (ESI), Electron Impact Ionization (EI) through the impact of high-energy electrons. 
Chemical Ionization (CI) through the use of other reactive compounds, and Matrix- Assisted 
Laser Desorption and Ionization (MALDI). Both ESI and MALDI also serve as means for 
sample introduction. 
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Once the molecules in a sample get fragmented and charged through ionization, each 
fragment will have a corresponding mass-to-charge (m/z) ratio, which will become the basis to 
mass dispersion. Based on the physical principles used, there are many different ways to achieve 
mass dispersion, resulting in mass spectral data similar in nature but different in details. A few 
5 of the commonly seen configurations include: magnetic sectors; quadrupoles; Time-Of-Flight 
(TOF); and Fourier Transform Ion-Cyclotron Resonance (FT ICR). 

The magnetic sectors configuration is the most straight-forward mass dispersion 
technique where ions with different m/z ratios would separate in a magnetic field and exit this 
field at spatially separated locations where they will be detected with either a fixed array of 
10 detector elements or a movable set of small detectors that can be adjusted to detect different ions 
depending on the application. This is a simultaneous configuration where all ions from the 
sample are separated simultaneously in space rather than sequentially in time. 

The quadrupoles configuration is perhaps the most popular MS configuration where ions 
of different m/z values will be filtered out of a set of (usually 4) parallel rods through the 
15 manipulation of RF/DC ratios applied to these rod pairs. Only ions of a certain m/z value will 
survive the trip through these rods at a given RF/DC ratio, resulting in the sequential separation 
and detection of fragment ions. Due to its sequential nature, only one detector element is 
required for detection. Another configuration that uses ion traps can be considered a special 
example of quadrupole MS. 
20 The Time-Of-Fhght (TOF) configuration is another sequential dispersion and detection 

scheme that lets the fragment ions accelerate under electrical field through a high vacuum flight 
tube before detection. Ions of different m/z values would arrive at different times to the detector 
and the arrival time can be related to the m/z values through the use of calibration standard(s). 
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In Fourier Transform Ion-Cyclotron Resonance (FT ICR), after fragmentation and 
ionization, all ions can be introduced to an ion cyclotron where ions of different miz ratios would 
be trapped and resonate at different frequencies. These ions can be pulsed out through the 
application of a Radio Frequency (RF) signal and the ion intensities measured as a function of 
5 time on a detector. Upon Fourier transformation of the time domain data measured, one gets 
back the frequency domain data where the frequency can be related back to m/z ratios through 
the use of calibration standard(s). 

Ions can be detected either directly by the use of Faraday cups or indirectly by the use of 
electron multipUer tubes (EMT)/plates (EMP) or photon multiplier tubes (PMT) after a converter 
10 that converts ions into light. FIGs. lA, IB, and IC are diagrams illustrating a typical mass 
spectral data trace on different ion intensity scales 110, 120, and 130 respectively plotted as a 
ftmction of m/z ratio, according to the prior art. 

The past one hundred years have witnessed tremendous strides made on the MS 
instrumentation with many different flavors of instruments designed and built for high 
15 throughput, high resolution, and high sensitivity work. The instrumentation has been developed 
to a stage where single ion detection can be routinely accomplished on most commercial MS 
systems with unit mass resolution allowing for the observation of ion fragments coming from 
different isotopes. In stark contrast to the sophistication in hardware, very little has been done to 
systematically and effectively analyze the massive amount of MS data generated by modem MS 
20 instrumentation. 

On a typical mass spectrometer, the user is usually required or supplied with a standard 
material having several fragment ions covering the mass spectral m/z range of interest. Subject 
to baseline effects, isotope interferences, mass resolution, and resolution dependence on mass. 
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peak positions of a few ion fragments are determined either in terms of centroids or peak maxima 
through a low order polynomial fit at the peak top. These peak positions are then fit to the 
known peak positions for these ions through either or other higher order polynomials to 
calibrate the mass (m/z) axis. 
5 After the mass axis calibration, a typical mass spectral data trace would then be subjected 

to peak analysis where peaks (ions) are identified. This peak detection routine is a highly 
empirical and compounded process where peak shoulders, noise in data trace, baselines due to 
chemical backgrounds or contamination, isotope peak interferences, etc., are considered. 
For the peaks identified, a process called centroiding is typically applied where an 
10 attempt at calculating the integrated peak areas and peak positions would be made. Due to the 
many interfering factors outlined above and the intrinsic difficulties in determining peak areas in 
the presence of other peaks and/or baselines, this is a process plagued by many adjustable 
parameters that can make an isotope peak appear or disappear with no objective measures of the 
centroiding quality. 

15 A description will now be given of some of the many disadvantages of the conventional 

approaches to processing mass spectrometry data. 

One disadvantage is the lack of mass accuracy. The mass calibration currently in use 
usually does not provide better than 0. 1 amu (m/z unit) in mass determination accuracy on a 
conventional MS system with unit mass resolution (ability to visualize the presence or absence of 

20 a significant isotope peak). In order to achieve higher mass accuracy and reduce ambiguity in 
molecular fingerprinting such as peptide mapping for protein identification, one has to switch to 
an MS system with higher resolution such as quadrupole TOF (qTOF) or FT ICR MS which 
comes at a significantly higher cost. 
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Another disadvantage is the large peak integration error. Due to the contribution of mass 
spectral peak shape, its variability, the isotope peaks, the baseline and other background signals, 
and the random noise, current peak area integration has large errors (both systematic and random 
errors) for either strong or weak mass spectral peaks. 
5 Yet another disadvantage includes difficulties with isotope peaks. Current approaches do 

not have a good way to separate the contributions from various isotopes which usually give out 
partially overlapped mass spectral peaks on conventional MS systems with xmit mass resolution. 
The empirical approaches used either ignore the contributions from neighboring isotope peaks or 
over-estimate them, resulting in errors for dominating isotope peaks and large biases for weak 

10 isotope peaks or even complete ignorance of the weaker peaks. When ions of multiple charges 
are concemed, the situation becomes even worse, due to the now reduced separation in m/z mass 
unit between neighboring isotope peaks. 

Yet still another disadvantage is nonlinear operation. The current approaches use a multi- 
stage disjointed process with many empirically adjustable parameters during each stage. 

15 Systematic errors (biases) are generated at each stage and propagated down to the later stages in 
an uncontrolled, unpredictable, and nonlinear manner, making it impossible for the algorithms to 
report meaningfiil statistics as measures of data processing quality and reliability. 

A ftirther disadvantage is the dominating systematic errors. In most of MS applications, 
ranging from industrial process control and enviroimiental monitoring to protein identification or 

20 biomarker discovery, instrument sensitivity or detection hmit has always been a focus and great 
efforts have been made in many instrument systems to minimize measurement error or noise 
contribution in the signal. Unfortunately, the peak processing approaches currently in use create 
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a source of systematic error even larger than the random noise in the raw data, thus becoming the 
Umiting factor in instrument sensitivity. 

An additional disadvantage is mathematical and statistical inconsistency. The many 
empirical approaches currently used make the whole mass spectral peak processing inconsistent 
5 either mathematically or statistically. The peak processing results can change dramatically on 
shghtly different data without any random noise or on the same synthetic data with slightly 
different noise, hi order words, the results of the peak processing are not robust and can be 
imstable depending on the particular experiment or data collection. 

Moreover, another disadvantage is the instrument-to-instrument variations. It has usually 
10 been difficult to directly compare raw mass spectral data from different MS instruments due to 
variations in the mechanical, electromagnetic, or environmental tolerances. With the current ad 
hoc peak processing apphed on the raw data, it only adds to the difficulty of quantitatively 
comparing results from different MS instruments. On the other hand, the need for comparing 
either raw mass spectral data directly or peak processing results from different instruments or 
1 5 different types of instruments has been increasingly heightened for the purpose of impurity 
detection or protein identification through computer searches in established MS libraries. 

Accordingly, it would be desirable and highly advantageous to have methods for 
calibrating Mass Spectrometry (MS) instruments systems and for processing MS data that 
overcome the above-described deficiencies and disadvantages of the prior art. 

20 
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SUMMARY OF THE INVENTION 

The problems stated above, as well as other related problems of the prior art, are solved 
by the present invention, methods for calibrating Mass Spectrometry (MS) instruments systems 
and for processing MS data. 
5 According to an aspect of the present invention, there is provided a method for obtaining 

at least one calibration filter for a Mass Spectrometry (MS) instrument system. Measured 
isotope peak cluster data in a mass spectral range is obtained for a given calibration standard. 
Relative isotope abundances and actual mass locations of isotopes corresponding thereto are 
calculated for the given calibration standard. Mass spectral target peak shape functions centered 

10 within respective mass spectral ranges are specified. Convolution operations are performed 
between the calculated relative isotope abundances and the mass spectral target peak shape 
functions to form calculated isotope peak cluster data. A deconvolution operation is peformed 
between the measured isotope peak cluster data and the calculated isotope peak cluster data after 
the convolution operations to obtain the at least one calibration filter. 

15 According to another aspect of the present invention, there is provided a method of 

processing raw mass spectral data. A total filtering matrix is applied to the raw mass spectral 
data to obtain calibrated mass spectral data. The total filtering matrix is formed by measured 
isotope peak cluster data, obtained for a given calibration standard in a mass spectral range. The 
total filtering matrix is further formed by relative isotope abundances and actual mass locations 

20 of isotopes corresponding thereto, calculated for a same calibration standard. The total filtering 
matrix is further formed by specified mass spectral target peak shape functions centered within 
the mass spectral range. The total filtering matrix is further formed by convolution operations 
performed between the calculated relative isotope abundances and the mass spectral target peak 
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shape functions to form calculated isotope peak cluster data. The total filtering matrix is further 
formed by a deconvolution operation performed between the measured isotope peak cluster data 
and calculated isotope peak cluster data after the convolution operations to obtain at least one 
calibration filter for the total filtering matrix. 
5 According to yet another aspect of the present invention, there is provided a method for 

analyzing mass spectral peaks corresponding to mass spectral data obtained from a Mass 
Spectrometry (MS) instrument system. A weighted regression operation is applied to mass 
spectral peaks within a mass spectral range. Regression coefficients are reported as one of 
integrated peak areas and mass deviations corresponding to one of nominal masses and estimated 
10 actual masses. 

According to still yet another aspect of the present invention, there is provided a method 
for calculating calibration filters for a Mass Spectrometry (MS) instrument system. At least one 
mass spectral peak shape function is obtained fi'om a given calibration standard. Mass spectral 
target peak shape functions centered at mid points within respective mass spectral ranges are 

15 specified. A deconvolution operation is performed between the obtained at least one mass 
spectral peak shape function and the mass spectral target peak shape functions. At least one 
calibration filter is calculated fi-om a result of the deconvolution operation. 

According to a further aspect of the present invention, there is provided a method of 
processing raw mass spectral data. A total filtering matrix is applied to the raw mass spectral 

20 data to obtain calibrated mass spectral data. The total filtering matrix is formed by obtaining, 
fi'om a given calibration standard, at least one mass spectral peak shape function. The total 
filtering matrix is further formed by specifying mass spectral target peak shape functions 
centered at mid points within respective mass spectral ranges. The total filtering matrix is further 
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formed by performing a deconvolution operation between the obtained at least one mass spectral 
peak shape function and the mass spectral target peak shape functions. The total filtering matrix 
is further formed by calculating at least one calibration filter from a result of the deconvolution 
operation. 

These and other aspects, features and advantages of the present invention will become 
apparent from the following detailed description of preferred embodiments, which is to be read in 
connection with the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIGs. 1 A, IB, and IC are diagrams illustrating a typical mass spectral data trace on 
different ion intensity scales 110, 120, and 130 respectively plotted as a function of m/z ratio, 
according to the prior art; 

FIGs. 2A and 2B are diagrams illustrating mass spectral peak data for the ion fragment 
C3F5 on two different intensity scales; 

FIGs. 3 A and 3B are diagrams respectively illustrating the measured isotope cluster 310 
before and after pre-convolution, according to an illustrative embodiment of the present 
invention; 

FIGs. 3C and 3D are diagrams respectively illustrating the calculated isotope cluster 320 
before and after pre-convolution, according to an illustrative embodiment of the present 
invention; 

FIG. 3E and 3F are diagrams respectively illustrating the derived peak shape function 330 
thus calculated and the corresponding deconvolution residual 340, according to an illustrative 
embodiment of the present invention; 
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FIG. 4 is a diagram illustrating exemplary deconvoluted peak shape functions 410, 
according to an illustrative embodiment of the present invention; 

FIG. 5 is a diagram illustrating exemplary interpolated peak shape functions 510 based on 
the deconvoluted peak shape functions 410 of FIG. 4, according to an illustrative embodiment of 
5 the present invention; 

FIG. 6 is a diagram illustrating two exemplary targets 610, 620 that satisfy pre-specified 
requirements for mass spectrometry calibration, according to an illustrative embodiment of the 
present invention; 

FIG. 7 is a diagram illustrating a collection 710 of calibration filters calculated for a set 
10 of masses, according to an illustrative embodiment of the present invention; 

FIG. 8 is a diagram illustrating a graphical representation 800 of the filter matrix 
application combined with interpolations and mass pre-aligranent, according to an illustrative 
embodiment of the present invention; 

FIGs. 9A, 9B, and 9C are diagrams illustrating a first segment 910 and a second segment 
15 920 of a Mass Spectrometry (MS) spectrum before and after full calibration (both FIGs. 9A and 
9B) and the variance spectrum 930 (FIG. 9C), according to an illustrative embodiment of the 
present invention; 

FIG. 1 OA is a diagram illustrating a stick spectrum 1010 reflecting the t-statistic as a 
function of the exact mass locations (Equation 10) for possible mass spectral peaks across the 
20 mass range (raw mass spectrum taken from FIG. 1), according to an illustrative embodiment of 
the present invention; 
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FIGs. lOB and IOC are diagrams illustrating the overlay 1020 of the raw MS spectral 
segment and its fully calibrated version 1030, according to an illustrative embodiment of the 
present invention; 

FIG. lOD is a diagram illustrating the corresponding t-statistic 1040 and a horizontal 
5 cutoff line 1050 with critical t values set at 12, according to an illustrative embodiment of the 
present invention; 

FIG. 1 1 is a diagram illustrating a method for operating a Mass Spectrometry (MS) 
instrument system, according to an illustrative embodiment of the present invention; 

FIG. 12 is a diagram further illustrating step 1 1 lOH of the method of FIG. 11, according 
10 to an illustrative embodiment of the present invention; 

FIG. 13 is a diagram illustrating a method for analyzing a Mass Spectrometry (MS) 
spectrum obtained from an MS instrument system after the full mass spectral calibration, 
according to an illustrative embodiment of the present invention; 

FIG. 14 is a diagram illustrating a method for analyzing a Mass Spectrometry (MS) 
1 5 spectrum obtained from an MS instrument system after determination of peak shape functions, 
according to an illustrative embodiment of the present invention; 

FIG. 15 is a diagram further illustrating the method of fig 1 1 including optional steps for 
calibrating Mass Spectrometry (MS) system, according to an illustrative embodiment of the 
present invention; and 

20 FIG. 16 is a diagram illustrating a method for processing a Mass Spectrometry (MS) 

spectrum obtained from an MS instrument system, according to an illustrative embodiment of the 
present invention. 
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DETAILED DESCRIPTION OF THE INVENTION 

The present invention is directed to methods for caUbrating Mass Spectrometry (MS) 
instruments systems and for processing MS data. 

It is to be understood that the present invention may be implemented in various forms of 
5 hardware, software, firmware, special purpose processors, or a combination thereof. Preferably, 
the present invention is implemented as a combination of hardware and software. Moreover, the 
software is preferably implemented as an application program tangibly embodied on a program 
storage device. The application program may be uploaded to, and executed by, a machine 
comprising any suitable architecture. Preferably, the machine is implemented on a computer 

10 platform having hardware such as one or more central processing units (CPU), a random access 
memory (RAM), and input/output (I/O) interface(s). The computer platform also includes an 
operating system and microinstmction code. The various processes and fimctions described 
herein may either be part of the microinstruction code or part of the application program (or a 
combination thereof) that is executed via the operating system. In addition, various other 

15 peripheral devices may be connected to the computer platform such as an additional data storage 
device and a printing device. 

It is to be further understood that, because some of the constituent system components 
and method steps depicted in the accompanying Figures are preferably implemented in software, 
the actual connections between the system components (or the process steps) may differ 

20 depending upon the manner in which the present invention is programmed. Given the teachings 
herein, one of ordinary skill in the related art will be able to contemplate these and similar 
implementations or configurations of the present invention. 
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A novel approach to processing mass spectrometry data will now be described which will 
combine mass spectrometer calibration and mass spectral peak analysis into one total calibration 
process to address all the issues discussed above. Proper and accurate mass spectrometer 
calibration in both mass and peak shape will provide a solid foundation for accurate peak 
5 identification, analyte quantitation, and sample classification during the next stage of mass 
spectral data analysis. 

A description will now be given of mass spectral calibration according to an illustrative 
embodiment of the present invention. The description of mass spectral calibration will include 
descriptions relating to the following: mass spectral calibration standard; calculation of relative 
10 isotope abundances; mass pre-ahgnment; mass spectral peak shape fimctions; peak shape 

function interpolation; calibration filters and their interpolation; appHcation of calibration filters, 
and error propagation through calibration filters. 

Instead of calibrating mass alone without consideration of mass spectral peak shape and 
its mass-dependency, a complete calibration including all of these will be carried out as part of 
15 the overall process. There are a few key steps in this complete calibration process, which will be 
discussed in detail below. 

The description of a mass spectral calibration will now be given according to an 
illustrative embodiment of the present invention. A calibration standard that has mass fi-agments 
scattered over the whole mass range will be selected to provide both mass calibration and mass 
20 spectral peak shape information. Due to the presence of naturally occurring isotopes in the 
elements that form the standard molecule, typically multiple isotope peaks can be observed for 
the same ion firagment at different abundances. 
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A commonly used standard in gas chromatography-mass spectrometry (GC/MS) is 
perfluorotributylamine (formula: C12F27N, molecular weight: 671). It has EI fragments at 69, 
100, 119, 131, 169, 219, 264, 364, 414, 464, 502, etc. (see FIG. 1 for an example spectrum). 
This standard is typically imbedded in a commercial GC/MS instrument so that the molecule can 
be readily vaporized and diffuse into the MS system at the time of caUbration through a 
computer-controlled valve. 

Other standards under a variety of ionization schemes include polymers and synthetic 
peptides that can fragment into multiple well-characterized ion fragments covering the mass 
range of interest. In tandem MS systems where a second fragmentation is carried out, for 
example, one can obtain a mass spectrum with regularly spaced mass spectral peaks from a 
parent peptide ion due to the loss of successive amino acids during this secondary fragmentation 
- a well-known process for peptide sequencing. Many intact proteins in ESI mode will carry 
multiple charges (z), sometimes from 1 to 10 or more, which will generate mass spectral peaks 
covering up to one order of magnitude or more in mass (m/z) range. 

The description of the calculation of relative isotope abundances will now be given 
according to an illustrative embodiment of the present invention. On mass spectrometers that do 
not provide complete mass separation between different isotope peaks it is necessary to first 
calculate the relative isotope abundances and their exact mass locations. FIGS. 2A and 2B 
illustrate this limited mass separation between isotope peaks. A few published methods can be 
used to perform this theoretical calculation based on the elemental compositions, the known 
relative abundances of the elements contained in the ion fragment, and the electrical charges. 
Some of these methods are described by Alan Rockwood et al., in Anal. Chem., 1995, 67, 2699, 
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and by James Yergey, in Int. J. Mass Spec. & Ion. Physics, 1983, 52, 337, the disclosures of both 
of which are incorporated by reference herein. 

For an ion fragment of the form AaBbCcDd , the isotope distribution is given by: 

5 (i:M)'(i:6,4/(Zc,.cj(2]^,.D,r... 

where b^c.d, ... are the number of atoms A, B, C, D, . . ., respectively, and a,, fc/, c,, du ... are 
the natural abundances for isotopes Ai, Bi, Ci, Di, . . ., respectively. This expression can be 
expanded and re-organized to give the mass locations and abundances of all expected isotopes. 
10 For example, for the ion fragment in FIGs. 2 A and 2B, it is known that it has electrical charge of 
one and elemental composition of C3F5, with the natural abundance for C and F given by: 

C^^ = 12.000000, ci2 = 0.9893 
C^^ = 13.003354, ci3 = 0.0107 
15 F^^ = 18.998403, fig = 1.0000 

The isotope masses (m) and relative abundances (y) for this ion fragment can therefore be 
calculated as 



3C''+5F" 


'130.992015' 




131.995369 




132.998723 




134.002077 
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y = 




3.1417x10 



3.3979x10 



9.6824x10 



1.2250x10 



1-6 



1-2 



1-4 



1-1 



Such isotope peak information (both mass locations and relative abundances) will be 
utilized later for the exact and complete calibration of mass spectral data. 

The description of mass pre-alignment will now be given according to an illustrative 
embodiment of the present invention. In order to make more accurate peak shape interpolation 
in the next step, it is necessary to pre-align or pre-calibrate the standard mass spectrum first 
based on the identifiable isotope peak clusters across the spectrum. For each isotope peak cluster 
identified, a centroid is calculated as follows: 



where yo is a column vector containing the actually measured mass spectral continuum data for 
the isotope cluster under consideration and the superscript T denotes transpose, i.e., a row vector 
containing all the same elements as the colunm version, mo is a column vector corresponding to 
the mass axis on which the isotope cluster is measured (can have either mass units or time units), 
and 1 is a column vector full of ones with the same length as mo or y©. Similarly, another 
centroid can be calculated based on the calculated isotope distributions as follows: 




T 

y m 
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Thus a calibration relationship of the form 

m = f(mo) (Equation 1) 

5 

can be established through a least-squares polynomial fit between the centroids measured and the 
centroids calculated using all clearly identifiable isotope clusters available in the mass spectral 
standard across the mass range. 

Note again nto does not have to be in mass unit (m/z) but rather any physical unit which 
10 ion intensities are measured as a function of. In FTMS and TOF, mo comes naturally in time 
units and the first and second order terms in the polynomial fit become dominant for FTMS and 
TOF, respectively. 

In MS systems that contain significant background signals due to the presence of either 
chemical noise or other particles such as neutrals, it may be beneficial to fit a lower order 

15 baseline using only the collected data before and after the mass spectral peaks of interest and 
subtract this baseline contribution fi:om yo to effect a more accurate determination of the 
centroid, mo. It will become obvious later on, however, that it is not critical to have the absolute 
mass calibration at this stage due to the refinement that comes with the total calibration filters. 
The description of mass spectral peak shape functions will now be given according to an 

20 illustrative embodiment of the present invention. For each mass spectral peak cluster (including 
all significant isotope peaks) identified such as the one shown in FIGs. 2A and 2B, a mass 
spectral peak shape function at this mass can be derived through the following deconvolution: 
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where y© is the actually measured isotope peak cluster, y is the theoretically calculated isotope 
distribution for the particular ion fragment around this mass, and p is the peak shape function to 
5 be calculated. While yo is an actually measured mass spectrum continuously sampled in a given 
mass window and can be easily converted through interpolation onto equally spaced mass 
intervals, the theoretically calculated isotope distribution is defined only on discrete and 
irregularly-spaced masses, such as the (m,y) shown above. 

A key step in making this deconvolution possible is by numerically convoluting a narrow 
10 Gaussian peak to both yo and y before the deconvolution, i.e., 

(g®yo) = (g®y) ®p or yo' = y'® p (Equation 2) 

This pre-convolution allows for continuously sampling both y© and y onto the same equally 
15 spaced mass intervals. In order to minimize noise propagation through this pre-convolution, it is 
important to use a Gaussian peak whose peak width is several times (for example, 4 times) 
smaller than the FWHM of an individual isotope peak. FIGs. 3A and 3B are diagrams 
respectively illustrating the measured isotope cluster 310 before and after pre-convolution, 
according to an illustrative embodiment of the present invention. FIGs. 3C and 3D are diagrams 
20 respectively illustrating the calculated isotope cluster 320 before and after pre-convolution, 
according to an illustrative embodiment of the present invention. The pre-convolution can be 
accomplished through either matrix multipUcation or Fast Fourier Transform (FFT) with zero 
filling, both well established in the open literature, for example, by William Press et al, in 
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Numerical Recipes in C, 2"^ Ed, 1992, Cambridge University Press, p. 537, the entire disclosure 
of which is incorporated by reference herein. 

Similar to pre-convolution, the deconvolution of y' from yo' to obtain peak function p 
can be accomplished through either matrix inversion or FFT division. Due to the banded nature 
5 of the matrix, efficient computational algorithms are available from the open literature for the 
matrix inversion. Such algorithms are further described by Gene Golub et al., in Matrix 
Computations, 1989, Johns Hopkins University Press, p. 149, the entire disclosure of which is 
incorporated by reference herein. Altematively, the efficient deconvolution can also be carried 
out through FFT division, hi either case, it is critical to have proper noise filtering in place to 

10 control the noise propagation during the deconvolution process. This can be accomplished by 
discarding small singular values in the matrix approach before inversion or by replacing the real 
and imaginary part of the FFT division with interpolated values whenever division by a small 
number is encountered. The discarding of small singular values is further described by 
Yongdong Wang et al, in Anal Chem., 1991, 63, 2750 and by Bruce Kowalski et al., in J. 

15 Chemometrics, 1991, 5, 129, the disclosures of both of which are incorporated by reference 
herein. FIG. 3E and 3F are diagrams respectively illustrating the derived peak shape function 
330 thus calculated and the corresponding deconvolution residual 340, according to an 
illustrative embodiment of the present invention. It is desired to have the proper noise filtering 
in place during the deconvolution such that the residual after the deconvolution is of a random 

20 nature with magnitude comparable to the expected noise level in the measured data yo. 

In MS systems that contain significant background signals due to the presence of either 
chemical noise or other particles such as neutrals, it may be beneficial to fit a lower order 
baseline using only the collected data before and after the mass spectral peaks of interest and 
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subtract this baseline contribution from yo before the pre-convolution. The purpose of this 
baseline correction is to ensure that the baseline from actually measured match that of the 
theoretically calculated. 

The description of peak shape ftmction interpolation will now be given according to an 
5 illustrative embodiment of the present invention. A few other peak shape functions can be 
calculated similarly from other well-characterized ion fragments across the mass spectral peak 
range from the mass spectrum of the same standard sample. FIG. 4 is a diagram illustrating 
exemplary deconvoluted peak shape fimctions 410, according to an illustrative embodiment of 
the present invention. In order to obtain peak shape frmctions for all other masses of interest 
10 within the mass spectral range, an interpolation on the few calculated peak shape fimctions will 
be required. An efficient interpolation algorithm that also allows for noise filtering is devised. 
Instead of interpolation in the original mass spectral space, these few available mass peak shape 
fimctions will be collected in a matrix P to be decomposed through Singular Value 
Decomposition (SVD) first, 

15 

P = USV^ 

where P is the peak shape fimction matrix with peak shape fimctions arranged in rows, U 
contains the left singular vectors in its coluirms, S is a diagonal matrix with descending singular 
20 values on the diagonal, and V contains the right singular vectors in its columns. SVD algorithm 
has been described by Gene Golub et al, in Matrix Computations, Johns Hopkins University 
Press, p. 427, the entire disclosure of which is incorporated by reference herein. Usually only a 
few (such as 3 to 4) singular values/vectors would be significant, depending on the consistency 
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of peak shape functions as a function of mass. For example, if all peak shape functions are 
exactly the same with only minor mass shifts among them, one expects only two significant 
singular values/vectors. If all peak shape functions are identical to each other with no mass shift, 
one would expect only one singular value/vector. This explains why a pre-aligmnent step is 
5 needed above in order to result in a more economic decomposition and interpolation with 
minimal number of singular values/vectors involved. 

When the elements of the left singular vectors are plotted against the mass, one expects a 
smooth dependence on the mass, a functional dependence amenable for accurate interpolation. 
A cubic spline interpolation can be easily apphed to the first few columns in matrix U to obtain 
10 an expanded matrix U with many more number of rows that cover the full mass spectral range. 
An expanded peak shape function matrix P containing interpolated peak shape functions can be 
easily constructed via 

P = USV^ 

15 

where each row in P contains one peak shape function at any interpolated mass centroid. FIG. 5 
is a diagram illustrating exemplary interpolated peak shape functions 510 based on the 
deconvoluted peak shape functions 410 of FIG. 4, according to an illustrative embodiment of the 
present invention. 

20 It should be pointed out that the SVD decomposition here can also be replaced with other 

decompositions, such as wavelet decompositions, to arrive at similar results at a different 
computational cost. 
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The description of calibration filters and their interpolation will now be given according 
to an illustrative embodiment of the present invention. With the peak shape functions obtained, 
the MS instrument system is now fully characterized both in terms of its mass axis and its peak 
shape functions. Based on this characterization, a full mass spectral calibration can now be 
5 performed. This calibration will be carried out in a single operation where the peak shape 

functions at different masses will be converted into more desirable peak shape functions centered 
at exact mass locations (target peak shape functions). While any analytically or numerically 
calculated peak shape functions can in principle serve as target peak shape functions, it is 
desirable to have targets with the following properties: smooth peak functions and derivatives 
10 (for numerical stabihty); analytically calculatable functions and derivatives (for computational 
efficiency); symmetrical peak shapes (for accurate mass determination in later peak detection); 
resemble the true mass spectral peak shape (for simpUfied calibration filters); peak width 
(FWHM) slightly larger than actually measured peak width (for computational stabihty and 
signal averaging). 

15 FIG. 6 is a diagram illustrating two exemplary targets 610, 620 that satisfy pre-specified 

requirements for mass spectrometry calibration, according to an illustrative embodiment of the 
present invention. The two exemplary targets 610 and 620 satisfy the requirements described 
above. The two exemplary targets 610 and 620 are a Gaussian and the convolution of a Gaussian 
and a boxcar, respectively. 

20 For each peak shape function p at a given centroid mass, a calibration filter f can be 

found such that: 



t = p (2) f (Equation 3) 
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where t is the target peak shape function centered at this given mass. This convolution would 
essentially convert the numerically calculated peak shape function p into a mathematically 
defined peak shape function centered at this exact mass location, accomplishing both mass and 
5 peak shape calibration in one convolution operation. The calculation of calibration filter f can be 
carried out in similar fashion to the deconvolution of peak shape functions through either matrix 
inversion or FFT division with appropriate noise filtering built-in. FIG. 7 is a diagram 
illustrating a collection 710 of calibration filters calculated for a set of masses, according to an 
illustrative embodiment of the present invention. 

10 It can be seen that the calibration filters vary smoothly with masses, similar to the peak 

shape functions. Since interpolation is computationally more efficient than deconvolution 
operation in general, it may be computationally advantageous to calculate the calibration filters 
at coarsely spaced masses across the whole range (for example, at every 1-5 amu spacing) and 
interpolate the calibration filters onto a finely spaced grid afterwards (for example, 1/8 or 1/16 

15 amu). The same approach described above for the interpolation of peak shape functions can be 
applied. Alternatively, one can bypass the calculations of peak shape functions in Equation 2 all 
together and combine Equations 2 and 3 into a single-step process: 

(t0y)= yo®f 

20 

where the convolution filters f at multiple standard masses can be calculated directly via matrix 
inversion or FFT division. An interpolation on these convolution filters will produce desired 
filters at specific masses (FIG. 7). 
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It should be noted that the cahbration filters calculated here would serve two purposes 
simultaneously: the calibration of mass spectral peak shapes and mass spectral peak locations. 
Since the mass axis has ahready been pre-calibrated above, the mass calibration part of the filter 
function is reduced in this case to achieve a further refinement on mass calibration, i.e., to 
5 account for any residual mass errors after the polynomial fit given by Equation 1 . 

This total calibration process should work well for quadrupole-type MS including ion 
traps where mass spectral peak width (Full Width at Half Maximum or FWHM) is expected to be 
roughly consistent within the operating mass range. For other types of mass spectrometer 
systems such as magnetic sectors, TOF, or FTMS, the mass spectral peak shape is expected to 
10 vary with mass in a relationship dictated by the operating principle and/or the particular 

instrument design. While the same mass-dependent cahbration procedure described so far is still 
applicable, one may prefer to perform the total calibration in a transformed data space consistent 
with a given relationship between the peak width/location and mass. 

In the case of TOF, it is known that mass spectral peak width (FWHM) Am is related to 
15 the mass (m) in the following relationship: 

Am = a^fm 

where a is a known calibration coefficient. In other words, the peak width measured across the 
20 mass range would increase with the square root of the mass. With a square root transformation 
to convert the mass axis into a new function as follows: 
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where the peak width (FWHM) as measured in the transformed mass axis is given by 



Am 



a 



24^ 2 

which will remain unchanged throughout the spectral range. 

For an FT MS instrument, on the other hand, the peak width (FWHM) Am will be directly 
proportional to the mass and therefore a logarithm transformation will be needed: 



which will be fixed independent of the mass. Typically in FTMS, Am/m can be managed on the 
order of 10"^, i.e., 10^ in terms of the resolving power m/Am, 

For a magnetic sector instrument, depending on the specific design, the spectral peak 
width and the mass sampling interval usually follow a known mathematical relationship with 
mass, which may lend itself a particular form of transformation through which the expected mass 
spectral peak width would become independent of mass, much like the way the square root and 
logarithm transformation do for the TOF and FTMS. 



m 



i'=hi(m) 



where the peak width (FWHM) as measured in the transformed log-space is given by 
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When the expected mass spectral peak width becomes independent of the mass, due 
either to the appropriate transformation such as logarithmic transformation on FTMS and square 
root transformation on TOF-MS or the intrinsic nature of a particular instrument such as a well 
designed and properly tuned quadmpole or ion trap MS, huge savings in computational time will 
be achieved with a single calibration filter applicable to the full mass spectral range. This would 
also simplify the requirement on the mass spectral calibration standard: a single mass spectral 
peak would be required for the calibration with additional peak(s) (if present) serving as check or 
confirmation only, paving the way for complete mass spectral calibration of each and every MS 
based on an intemal standard added to each sample to be measured. 

The description of the application of the calibration filters will now be given according to 
an illustrative embodiment of the present invention. 

The calibration filters calculated above can be arranged into the following banded 
diagonal filter matrix: 



F = 



in which each short column vector on the diagonal, f„ is taken from the convolution filter 
calculated above for the corresponding center mass. The elements in f/ is taken from the 
elements of the convolution filter in reverse order, i.e.. 
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" f. 



A. 

This calibration matrix will have a dimension of 8,000 by 8,000 for a quadrupole MS with mass 
coverage up to 1,000 amu at 1/8 amu data spacing. Due to its sparse nature, however, typical 
5 storage requirement would only be around 40 by 8,000 with an effective filter length of 40 
elements covering a 5-amu mass range. 

FIG. 8 is a diagram illustrating a graphical representation 800 of the filter matrix 
application combined with interpolations and mass pre-aUgnment, according to an illustrative 
embodiment of the present invention. There are three components to the total calibration: Pre- 
10 calibration matrix A; Calibration matrix F; and Post-calibration matrix B. 

Pre-calibration matrix A takes on the form of a banded diagonal with each nonzero 
column along the diagonal performing an essentially interpolation function. This interpolation 
fimction can include: (a) conversion from non-uniformly spaced raw MS data into uniformly- 
spaced MS data; (b) pre-alignment of the mass axis; and (c) proper transformations for TOF, 
1 5 FTMS, or magnetic sector instruments. 

Calibration matrix F is a banded diagonal matrix to perform both peak shape and mass 
axis calibration. 

Post-calibration matrix B, similar to pre-calibration matrix A, takes on the form of a 
banded diagonal with each nonzero column along the diagonal performing another interpolation 
20 fiinction. This interpolation function can include: (a) conversion from the internal uniform 



28 



Express Mail No. EV314133359US Atty Docket 165-7 

spacing into either uniform or nonuniform reported spacing; and (b) transform back into the 
linear mass space for TOF, FTMS, or magnetic sector instruments. 

The factorization shown in FIG. 8 is made possible by Lagrange interpolation where the 
interpolation can be structured as a filtering operation independent of the y-values on which the 
5 interpolation operates. Lagrange interpolation algorithm is described by WilUam Press et al, in 
Numerical Recipes in C, 2"^ Ed, 1992, Cambridge University Press, p. 105, the entire disclosure 
of which is incorporated by reference herein. On instruments that output raw mass spectrum at 
predefined mass intervals, all three matrices can be pre-calculated as part of the calibration 
process and multiplied beforehand into an overall filtering matrix 

10 

F, =AFB 



which will have a banded structure similar to F with different elements. At runtime for each 
mass spectrum acquired, only one sparse matrix multiplication is required 

15 

So = sFi 



where s is a row vector containing raw MS data and sq is another row vector containing fiiUy 
calibrated MS data at desired output spacing. The real time portion of this operation is expected 
20 to be computationally efficient as it is basically to filter the raw un-cahbrated data into fiiUy 
calibrated MS data for output. On some MS instruments, each mass spectrum is acquired at 
different and non-uniform mass intervals. Li this case, the pre-calibration matrix A is different 
for each acquisition, with only F and B matrices fixed before the next calibration. These two 
matrices can be pre-multiplied with the following real time operation 
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So = s A(FB) 

which will be computationally more expensive due to the extra interpolation or multiplication 
5 step for each acquisition. 

It should be noted that in some instrument systems, it may be possible to carry out the full 
mass spectral calibration on each individual sample on-the-fly. For example, on FTMS or TOP, 
after the logarithm or square root transformation, only one deconvolution sequence is required 
for an MS peak (internal standard peak) through Equations 2 and 3 to construct a new banded 

10 diagonal matrix F with the identical nonzero elements contained in each column while both A 
and B may be kept unchanged. The full calibration thus developed could then be appUed to the 
same original MS spectrum to effect a full calibration on all peaks (including the intemal 
standard peak and other unknown peaks to be analyzed). The same on-the-fly calibration can be 
applied to other MS systems where the peak shape functions are effectively independent of the 

15 mass, requiring the minimimi of one MS peak located anywhere within the mass range as the 
intemal standard on which to derive the filter matrix F with identical nonzero elements along 
each colimm. The intemal standard will be a selected compound having well characterized 
isotope clusters and can be added to each unknown sample during sample preparation steps 
beforehand or infused and mixed online with an unknown sample in real time. 

20 One may carry out some parts of this full calibration through an updating algorithm to 

combine extemal standards (through a different MS acquisition) with intemal standards (within 
the same MS acquisition) in a computationally efficient way. For example, one may apply the 
last available full calibration based on the most recently measured extemal standard to an 
xmknown sample containing an intemal standard peak. By checking the exact mass location and 
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the peak shape of the internal standard after the calibration (see next section below for peak 
analysis), one may find that the peak shape has not changed and there exists only a minor mass 
shift. As a result, FB could be kept the same requiring only a small update on matrix A, which is 
fiiUy capable of shift compensation. 
5 The description of error propagation through the calibration filters will now be given 
according to an illustrative embodiment of the present invention. 

In order to properly identify and quantify mass spectral peaks, it is important to estimate 
the variance in the calibrated MS data. For the majority of MS instruments, the random error on 
ion intensity measurement is dominated by ion counting shot noise, i.e., the variance in raw MS 
10 data is proportional to the ion signal itself The variance spectrum of the calibrated MS spectrum 
So is therefore given by 

oc SF2 (Equation 4) 

15 where F2 is the same size as Fi with all corresponding elements in Fj squared. This tums out to 
be just one more filtering on the same raw MS data with all filter elements squared. 

FIGs. 9 A, 9B, and 9C are diagrams illustrating a first segment 910 and a second segment 
920 of a Mass Spectrometry (MS) spectrum before and after fiiU calibration (both FIGs. 9A and 
9B) and the variance spectrum 930 (FIG. 9C), according to an illustrative embodiment of the 
20 present invention. 

A description will now be given of mass spectral peak analysis according to an 
illustrative embodiment of the present invention. The description of mass spectral peak analysis 
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will include descriptions relating to the following: peak matrix construction; Weighted Multiple 
Linear Regression (WMLR); detection of significant peaks; and refinement for peak analysis. 

An MS spectrum after fiiU calibration described above would be ideally suited for 
efficient, reliable, and highly sensitive peak detection. As will become clear later in this section, 
5 while peak analysis can be carried out in either the natural mass unit or the transformed unit (for 
FTMS or TOF or magnetic sector instruments), significant computational savings can be 
achieved to perform the mass spectral peak analysis in a transformed space (also referred to 
herein as "calibrated space") where peak shape fimctions are of the same width across the fiiU 
mass range. 

10 The description of peak matrix construction will now be given according to an illustrative 

embodiment of the present invention. The peak analysis problem is formulated as follows: a 
mass spectral trace is a linear combination of many peaks of known peak shapes located 
nominally at 1/z mass unit apart with peak center offsets reflecting mass defects. For singly 
charged ions (z = 1), the nominal spacing would be 1 mass unit apart with some offsets in either 

1 5 positive or negative directions. The mass spectral peak analysis problem can then be formulated 
as a Multiple Linear Regression (MLR): 



20 where so is a row vector containing the fiiUy calibrated MS spectrum, P is the peak component 
matrix containing nominally spaced known peak functions (each with analytically integrated area 
of unity) in its rows, c is a row vector containing the integrated peak intensities for all nominally 
spaced peaks, and e is the fitting residual. To account for baseline contributions, baseline 



So = cP + e 



(Equation 5) 



Express Mail No. EV314133359US Atty Docket. 165-7 

components such as offset, 1^^ order linear term or other higher order nonhnear functional forms 
can be added into the rows of the P matrix with the corresponding row vector c augmented by 
the corresponding coefficients to represent the contributions (if any) of these baseline 
components. 

5 Note that the full mass spectral calibration described above allows for analytically 

calculating the peak component matrix P in which all peaks would integrate to imit area 
analytically, leading to the corresponding estimates in c automatically reporting analytically 
integrated area, fi'ee from the interferences from other peaks (such as other isotope peaks) 
located nearby with automatic noise filtering and signal averaging (left in e). For the very same 
10 reason, it is also possible to perform unbiased isotope ratio measurement between nearby isotope 
peaks. 

Furthermore, the construction of peak component matrix P can be made computationally 
more efficient by performing the above full MS calibration to output calibrated MS data at an 
exact fraction of the nominal mass spacing, for example, at 1/4, 1/5, 1/8, 1/10, 1/12, 1/16 of 1 

15 amu. This way, the peak shape function will only need to be evaluated once for one row in P 
with other rows formed by simply shifting this row forward or backward. 

The description of Weighted Multiple Linear Regression (WMLR) will now be given 
according to an illustrative embodiment of the present invention. Since the error term e does not 
have uniform variance across the mass spectral range as indicated in the calibration section, a 

20 Weighted Multiple Linear Regression (WMLR) will need to be performed instead of the 
ordinary MLR, 



So diag(w) = cPdiag(w) + e (Equation 6) 



33 



Express Mail No. EV314133359US 



Atty Docket. 165-7 



where diag(w) is a diagonal matrix with the weights along the diagonal given by Equation 4, 
w = = l/(sF2) 

5 

where the shared proportional constant among all masses have been dropped with no impact on 
the regression. 

A least squares solution to Equation 6 will give 

10 c = Sodiag(w)P'^[Pdiag(w)P'^]'^ (Equation 7) 

and its variance estimated as 

s^{c} = e^diag{[Pdmg{yv)V^]-^} (Equation 8) 

15 

where is based on the weighted squared deviations 
= ediag(w)e^/cj^ 

20 with e given by the fitting residual in Equation 5 and ^Tbeing the degrees of freedom, defined as 
the difference between the number of independent mass spectral data points and the number of 
rows included in matrix P (number of coefficients in c to be estimated). The least squares 
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solution to Equation 6 is further described by John Neter et al., in Applied Linear Regression, 2"^ 
Ed., Irwin, 1989, p. 418, the entire disclosure of which is incorporated by reference herein. 

For an MS instrument with mass range reaching 1,000 amu with mass interval of 1/8 
amu, the peak component matrix P will typically be 1,000 by 8,000 but largely sparse with no 
5 more than 40 nonzero elements (covering 5-amu mass range) in each peak row (baseline 

components have all nonzeros in the corresponding rows). The data storage efficiency can be 
drastically enhanced through indexing to take advantage of the fact that the peak components are 
merely shifted version of each other when sampled at exact fractions of a nominal mass interval. 
Computationally, gains can be had by pre-calculating both Sodiag(w)P^ and [Pdiag(w)P^] 
10 separately through sparse matrix operation. The pre-calculation of the latter term should result in 
another sparse symmetrical matrix of dimension 1,000 by 1,000 but with diagonal band-width of 
-120 (nonzero elements in each row) and half band-width of --60 (considering the symmetry) in 
the above example. 

In the absence of baseline components with identical and symmetrical peak shape 
15 functions across the whole mass range, the above operation will lead to a sparse matrix 

[Pdiag(w)P^] which will have a block cyclic structure amenable for a computationally efficient 
inversion into [Pdiag(w)P^]'^ through block cyclic reduction. Block cyclic reduction is 
described by Gene Golub et al, in Matrix Computations, 1989, Johns Hopkins University Press, 
p. 173, and by William Press et al, in Numerical Recipes in C, 2"^ Ed, 1992, Cambridge 
20 University Press, p. 71, the disclosures of both of which are incorporated by reference herein. 

Even in the presence of baseline components with varying and non-symmetrical peak 
shape functions across the mass range, the sparse matrix [Pdiag(w)P ] will have the following 
special form (assuming three baseline components from offset, 1^^ to 2"^ order, for example): 
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X X X X X X X 

X X X X X X X 

X X X X X X X 

X X X X X 

X X X X X X 

XXX XXX 

XXX XX 



which can be solved efficiently as a block diagonal system. Block diagonal systems are 
described by Gene Golub et al, in Matrix Computations^ 1989, Johns Hopkins University Press, 
p. 170, the entire disclosure of which is incorporated by reference herein. 

When the true mass spectral peaks do not coincide exactly with nominal masses, one has 
the following linear combination equations (ignoring any baseline components for simplicity 
here without loss of generahty), 

where peak shape function p,- with center mass m,- can be expanded to 1^^ order in Taylor series as 



dm 



with p/(wi) being the peak shape function centered at the true mass location w„ p,</w/^?) being the 
peak shape function centered at the nominal mass location mio close to w„ Ami being the 
difference between the true and nominal mass location (mass defect or deviation fi-om nominal 
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mass due to multiple charges), and d^timio)ldm being the analytically calculated 1^^ derivative of 
the peak shape function centered at nominal mass niio. 

Talking into accoimt of the mass defect, one has the following modified equation 



where = CiAnti and n is the number of nominal masses under consideration. Written back 
into matrix form, one has 



where both c and P are augmented now by the coefficients in fi-ont of the derivative terms and 
the derivative terms themselves. It is important to note that because the peak shape Amotions are 
chosen to be symmetrical (and therefore orthogonal to the peak shape Amotions themselves), the 
inclusion of their derivatives has no adverse effects on the condition of the peak component 
matrix P, leading to the most precise mass determination and the most repeatable peak 
integration. 

The same WMLR described above can be applied to solve Equation 9 and arrive at the 
integrated peak areas c/, c^, , c^. In addition, Equation 8 can be used to calculate a standard 
deviation for each peak area thus obtained, leading to elegant statistical measures on the quality 
of these peak areas. 

An improved determination of the center mass locations can be obtained 



So = cP + e 



(Equation 9) 
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(Equation 10) 



where the relative error in Amt determination is given by 

5 \s(Amj)IAmi\ = \s(ci)lci\ + |5^„+i^/c„+/| 

with standard deviations for c/ and c/+„ available from Equation 8 directly. In other words, the 
standard error for shift estimate is 

10 s(AmO =[\s(Ci)ICi\^s(Cn^i)ICn4]\^^ 

which is also the standard error for the center mass given in Equation 10. 

The description of the detection of significant peaks will now be given according to an 
illustrative embodiment of the present invention. Based on the peak area estimation (Equation 7) 
15 and its standard deviation calculation (Equation 8) from the last section, t-statistic can be 
calculated 

ti = Cils(Ci) for i = i, 2, n 

20 which can be combined with the degree of fireedom {df) to statistically detect whether the 
concentration estimate C/ is significantly above zero or not, i.e., the presence or absence of a 
mass spectral peak. Typically the <^is large enough to be considered infinite and a t-statistic of 
more than 3.0 or other user-selected cutoff values indicates the statistically significant presence 
of a mass spectral peak. It is noted that a t-statistic cutoff higher than the usual 3.0 value may be 
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needed to account for the fact that individual mass spectral points after the full calibration 
depicted in FIG. 8 will no longer be statistically independent but become correlated in its noise. 
Realistic cutoff values can be established through either computer simulation or practical 
experience. 

5 FIG. 1 OA is a diagram illustrating a stick spectrum 1010 reflecting the t-statistic as a 

function of the exact mass locations (Equation 10) for possible mass spectral peaks across the 
mass range (raw mass spectrum taken from FIG. 1), according to an illustrative embodiment of 
the present invention. FIGs. lOB and IOC are diagrams illustrating the overlay 1020 of the raw 
MS spectral segment and its fully calibrated version 1030, according to an illustrative 

10 embodiment of the present invention. FIG. lOD is a diagram illustrating the corresponding t- 
statistic 1040 and a horizontal cutoff line 1050 with critical t values set at 12, according to an 
illustrative embodiment of the present invention. The high degree of simultaneous noise 
filtering/signal averaging and peak shape calibration can be clearly seen in FIG. lOB, which 
greatly facilitates the peak analysis with highly sensitive results shown in FIG. lOD, where the 

1 5 detection is only limited by the random noise in the data with no artifacts or other sources of 
systematic errors. 

The mass spectral peaks with its t-statistic above the cutoff will then be reported as 
statistically significant while those below the cutoff will be reported as not significant. Along 
with the t-statistic, the exact mass locations and the integrated peak areas can also be reported for 
20 the identification and quantification of particular molecules having the corresponding ion 

fragments. While F-statistic could have been more rigorously applied here, it is believed that the 
marginal t-statistic would be sufficient due to the minimal interactions (small co-variances) 
between the peak components. MulticoUearity and the application of F-statistic are further 
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described by JohnNeter et al., in Applied Linear Regression, 2"^ Ed., Irwin, 1989, p. 300, the 
entire disclosure of which is incorporated by reference herein. 

The description of the refinement for peak analysis will now be given according to an 
illustrative embodiment of the present invention. When higher degree of mass accuracy is 
5 desired, one may construct an iterative peak analysis process by treating the results obtained 
above as initial estimates, and update the peak component matrix P using the newly calculated 
center mass locations from Equation 10. Since the updated mass locations would not be spaced 
one nominal mass unit apart for each other, each peak component and its derivative form in P 
will need to be separately calculated analytically for all peaks of significance (based on the t-test 
10 described above). With the new P matrix constructed, new estimates for the c can be calculated, 
giving another update on the center mass locations: 

15 where 1,2, ... and m(^^^ ntio (nominal center mass locations). This iterative improvement 
will be completed when the incremental update Cn+/^^ becomes comparable to the standard 
deviation predicted from Equation 8. With such refinement implemented, extremely high mass 
accuracy can be achieved for strong mass spectral peaks due to the high signal to noise available 
for such peaks, for example, 2 ppm mass accuracy for the peak at mass 69 in FIG. 1. The mass 

20 accuracy will deteriorate as the peak intensity drops due to the decreased number of ions 

available for detection. In other words, the mass accuracy will be limited only by the random 
noise in the data but not by other artifacts or systematic errors, such as the presence of chemical 
noise, interference from the isotope peaks, irregular peak shapes, or unknown baselines, as these 
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artifacts would have been fully compensated for by the calibration and peak analysis approaches 
taken here. 

A description will now be given of some of the many attendant advantages and features 
of the present invention. The present invention provides a method for processing mass 
5 spectrometry data that is mathematically elegant, statistically sound, and physics-based. 
Beneficially, the present invention considers the presence of noise and isotope peaks as 
additional useful information in the overall scheme. The present invention handles noise, isotope 
distribution, multiple charges, baselines, peak identification, peak positioning, and peak 
quantitation, all simultaneously in one integrated process. The present invention combines 

10 occasional MS calibration with routine MS data analysis, and can drastically improve mass 

accuracy for either high- or low-resolution MS systems. On conventional MS systems with unit 
mass resolution (FWHM = 0.5-0.7amu), mass accuracy of 1-5 ppm level can be achieved. The 
present invention includes built-in baseline determination, noise filtering/signal averaging, and 
peak integration. The present invention is computationally efficient such that it can be employed 

15 for on-the-fly data reduction on GC/MS or LC/MS or other time-dependent MS detection 

systems. The present invention has output statistics for instrument diagnostics and data quality 
control. Moreover, the present invention involves all linear operators with predictable behaviors 
towards noise and other artifacts. The present invention achieves high mass precision for strong 
peaks and high sensitivity for weak peaks with wide dynamic range coverage. The present 

20 invention allows for the standardization of all different (types) of MS instruments and for 
universal highly accurate library searches. This allows for molecular fingerprinting at much 
reduced cost in complex matrices even w/o the need for separation due to the high mass accuracy 
achievable. 
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While above mass spectral calibration and peak analysis have been described for typical 
mass spectrometry systems having at least unit mass resolution, it is further appreciated that even 
for low resolution mass spectrometry systems that do not differentiate peaks located within unit 
masses, the above mass spectral calibration brings significant and intrinsic advantages. In low 
5 resolution mass spectrometry systems, no explicit peak identification is feasible due to the lack 
of spectral resolution. Instead of the conventional peak analysis including peak identification 
and quantification, the complete mass spectral trace is used as input to multivariate statistical 
analysis for either analyte quantification through multivariate cahbration or sample classification 
through cluster analysis or pattern recognition. These multivariate statistical approaches include 

10 Principal Component Analysis (PCA) or Principal Component Regression (PCR), as described 
by Bruce Kowalski et al, in 7. Chemometrics, 1991, 5, 129, the entire disclosure of which is 
incorporated by reference herein. One key factor for the successfiil appUcation of these 
multivariate statistical approaches is the high mass accuracy and consistent peak shape fimctions 
between samples and instruments, as described by Yongdong Wang et al, in Anal Chem., 1991, 

15 63, 2750, the entire disclosure of which is incorporated by reference herein. The complete mass 
spectral calibration introduced by this invention should properly align both the mass axes and 
mass spectral peak shape fimctions between different samples or instruments to allow for highly 
accurate multivariate spectral comparison for the purpose of either analyte quantification or 
sample classification (as used in biomarker discovery). 

20 FIG. 1 1 is a diagram illustrating a method for operating a Mass Spectrometry (MS) 

instrument system, according to an illustrative embodiment of the present invention. The MS 
instrument system is calibrated with respect to at least peak shape and mass axis (step 1110). It 
is to be appreciated that step 1110 can be broken down into steps 1 1 lOA-1 HOG below. 
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It is to be further appreciated that steps 1 1 lOE-1 1 lOH are optional. If steps 1 1 lOE- 
111 OH are performed, then the method of FIG. 13 may be performed subsequent to the method 
of FIG. 11. However, if steps lllOE-lllOH are omitted, then the method of FIG. 14 maybe 
performed subsequent to the method of FIG. 11. 

At step 1 1 lOA, relative abundances and exact mass locations of the isotopes are 
calculated for a given calibration standard. 

At step 1 1 lOB, isotope masses are pre-aligned based on calculated isotope peak clusters 
and measured isotope peak clusters corresponding to the calibration standard, so as to calibrate a 
mass axis of the MS instrument system. 

At step 1 1 IOC, peak shape functions are derived corresponding to the calculated and 
measured isotope peak clusters. 

At step 1 1 lOD, data corresponding to the derived peak shape functions is interpolated to 
obtain other peak shape functions within desired mass ranges. Each of the derived peak shape 
functions and the other peak shape functions correspond to the actually measured mass locations. 

At step 1 1 lOE, the peak shape functions and the other peak shape functions are converted 
to target peak shape functions centered at exactly the mid-point in the desired mass ranges. 

At step 1 1 lOF, calibration filters are calculated from the target peak shape functions and 
the calculated peak shape functions. 

At step 1 1 lOG, the calibration filters are interpolated onto a finer grid. 

At step 1 1 lOH, the calibration filters are apphed so as to calibrate the MS instrument 

system. 
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FIG. 12 is a diagram further illustrating step 1 1 lOH of the method of FIG. 1 1, according 
to an illustrative embodiment of the present invention. Step 1 1 lOH includes steps 1210A-1210C 
below. 

At step 121 OA, a pre-calibration matrix is calculated. Calculation of the pre-calibration 
5 matrix includes converting non-uniformly spaced data to uniformly spaced data, such conversion 
including pre-alignment of mass axis and optionally including transformation for TOF, FTMS, or 
magnetic sector instruments. 

At step 121 OB, a calibration matrix is calculated. The calculation of the calibration 
matrix includes creating a banded diagonal matrix where the non-zero elements in each column 
10 are taken from the elements of the convolution filter in reverse order after shifting. 

At step 12 IOC, a post-calibration matrix is calculated. The calculation of the post- 
calibration matrix includes interpolating from intemal spacing to reported or desired spacing and 
converting transformed space back into original mass space. 

FIG. 13 is a diagram illustrating a method for analyzing a Mass Spectrometry (MS) 
1 5 spectrum obtained from an MS instrument system, according to an illustrative embodiment of the 
present invention. 

Peaks in the MS spectrum are analyzed after fiiU calibration (step 1310). It is preferable, 
but not necessary, that the peak shape fimctions are identical across a fiiU range of mass 
spectrum. 

20 Calibrated MS data having a mass spacing preferably equal to an integer fraction (e.g., 

1/4, 1/5, 1/8, 1/10, 1/12, 1/16 ) of the nominal mass spacing (e.g., 1 amu) is received (step 
1310A). 
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One pair of matrix rows of a full peak component matrix is calculated, such that one row 
of the pair stores a target peak shape function that has been normalized to unit peak area and the 
other row of the pair stores the first derivative of the target peak shape function stored in the one 
row of the pair, and such that both the target peak shape function and its first derivative have 
5 been sampled at the integer fi-action of the nominal mass spacing (step 13 lOB). 

The full peak component matrix is completed by indexing the matrix such that peak 
components in the remainder of the rows are arranged as shifted versions of each other 
corresponding to each nominal mass within the full mass spectral range (step 1310C). 

A Weighted Multiple Linear Regression (WMLR) operation is performed using the 
10 inverse of the mass spectral variances as weights to calculate integrated peak area and mass 
deviations at all nominal masses within the full mass spectral range (sep 1310D). 

Standard deviations are calculated for all peak areas and mass deviations (step 1310E). 
Nominal masses are updated into actual masses by adding in the calculated mass 
deviations from corresponding nominal masses (step 1310F). 
15 The performing (step 1310D), calculating (1310E) and updating (1310F) steps are 

repeated until any incremental improvements in either the peak areas or the mass deviations are 
smaller than corresponding standard deviations or other preset criteria (step 1310G). If the 
incremental improvements in either the peak areas or the mass deviations are not smaller than the 
corresponding standard deviations or other preset criteria, then the full peak component matrix is 
20 constructed using the actual masses (step 13 lOH), and the method retums to step 13 lOD. 
Otherwise, the method proceeds to step 13101. 
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T-statistics are calculated for all of the peak areas (step 13101), to obtain a mass spectral 
peak list that includes peak area and exact masses for statistically significant mass peaks (step 
1310J). 

FIG. 14 is a diagram illustrating a method for analyzing a Mass Spectrometry (MS) 
5 spectrum, according to an illustrative embodiment of the present invention. 

Peaks in the MS spectrum are analyzed after determining peak shape Amotions covering 
the full mass spectral range (step 1410). Peak shape interpolations are performed to obtain one 
peak shape fimction at each nominal mass (step 1410A). 

First derivatives of the peak shape functions are calculated at all nominal masses (step 
10 1410B). Peak shape functions and the corresponding first derivatives are combined into a full 
peak component matrix (step 14 IOC). 

A Weighted Multiple Linear Regression (WMLR) operation is performed using the 
inverse of the mass spectral variances as weights to calculate integrated peak area and mass 
deviations at all nominal masses within the full mass spectral range (sep 1410D). 
15 Standard deviations are calculated for all peak areas and mass deviations (step 141 OE), 

Nominal masses are updated into actual masses by adding in the calculated mass 
deviations from corresponding nominal masses (step 141 OF). 

The performing (step 1410D), calculating (1410E) and updating (1410F) steps are 
repeated until any incremental improvements in either the peak areas or the mass deviations are 
20 smaller than corresponding standard deviations or other preset criteria (step 141 OG). If the 

incremental improvements in either the peak areas or the mass deviations are not smaller than the 
corresponding standard deviations or other preset criteria, then the full peak component matrix is 
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reconstructed using the actual masses (step 141 OH), and the method retums to step 141 OD. 
Otherwise, the method proceeds to step 14 101. 

T-statistics are calculated for all of the peak areas (step 13101), to obtain a mass spectral 
peak list that includes peak area and exact masses for statistically significant mass peaks (step 
5 1410J). 

FIG. 15 is a flow diagram illustrating a method for creating calibration filters for a Mass 
Spectrometry (MS) instrument system, according to an illustrative embodiment of the present 
invention. 

One or more compounds are selected as a Mass Spectrometry (MS) standard (1510). MS 
10 profile data is acquired on the MS standard(s) (step 151 OA). Each ion firagment cluster is 
identified (step 1510B). 

Following step 1510B, it is determined whether significant isotopes exist (step 15 ION). 
If so, relative isotope abundances are calculated at exact masses (step 15 IOC). A pre-cahbration 
step is performed (step 15 lOD). The pre-calibration step may involve performing pre-calibration 
15 instrument-dependent transformations on raw data, performing a pre-calibration mass spacing 
adjustment, and/or pre-aligning mass spectral isotope peaks. 

It is then determined whether obtaining peak shape Amotions is desired (step 1510E). If 
so, convolution operations are performed on both the calculated relative isotope abundances and 
the measured isotope peak clusters using the same continuous fimction with a narrow peak 
20 width, and then a deconvolution operation is performed between the measured isotope peak 
clusters and the resulted isotope peak clusters after the convolution operations (step 1510T) to 
obtain at least one peak shape fiinction (151 OP), and the method proceeds to step 1510Q. 
Otherwise, convolution operations are performed between the calculated relative isotope 
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abundances and the target peak shape functions (step 1510F) and a deconvolution operation is 
performed between the measured isotope peak clusters and the resuhed isotope peak cluster after 
the convolution operations (step 1510G) to obtain at least one calibration filter (151 OH). 

Also following step 1510B, it is determined whether significant isotopes exist (step 
5 1510N). Ifnot, a pre-calibration step is performed (15 lOO). The pre-calibration step may 
involve performing pre-calibration instrument-dependent transformations on raw data, 
performing a pre-calibration mass spacing adjustment, and/or pre-aligning mass spectral isotope 
peaks. 

The peak shape fimctions thus obtained (151 OP) are interpolated (step 1510Q) before 
1 0 deconvolution operation (151 OS) with specified target peak shape functions (step 151 OR). 

At step 1510S, a deconvolution operation is performed between mass spectral target peak 
shape fimctions and one of measured mass spectral peak shape functions and the calculated mass 
spectral peak shape functions to convert the mass spectral peak shape functions and the at least 
one other mass spectral peak shape function to the mass spectral target peak shape functions 
15 centered at mid-points within respective mass ranges covered by the mass spectral peak shape 
functions and the at least one other mass spectral peak shape function. At least one calibration 
filter is calculated fi"om the mass spectral target peak shape functions centered at the mid-points 
within the respective mass ranges covered by the mass spectral peak shape functions and the at 
least one other mass spectral peak shape function (step 1510H). 
20 An interpolation operation is performed between two calibration filters to obtain at least 

one other calibration filter within a desired mass range (step 15101). 

A full calibration filter set is obtained from the calibration filters of step 1510H and any 
resulting from the interpolation of step 15101 (step 1510J). A post-calibration step is performed 
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(step 1510K). The post-calibration step may involve performing post-calibration instrument- 
dependent transformations and/or performing a post-calibration mass spacing adjustment. 

Data are combined corresponding to the pre-calibration step 1510O, the full calibration 
filter of step 1510J, and the post-calibration step 1510K (step 1510L) to obtain a total calibration 
5 filter set Fi and a variance filter set F2 (step 151 OM). 

FIG. 16 is a block diagram illustrating a method for processing Mass Spectrometry (MS) 
data, according to an illustrative embodiment of the present invention. 

MS profile data is acquired on test samples (step 1610). The profile data is interpolated if 
necessary (step 161 OA). Sparse matrix multiplication is performed with the total calibration 
10 filter set Fi and/or the variance filter set F2 (step 1610B). Calibrated data is then interpolated 
into reported mass spacing if necessary (step 16 IOC). 

The mass spectral variances are reported at each mass sampUng point (step 1610G), and 
the method proceeds to step 161 OH. Also following step 1610C, the mass spectral data is 
calibrated for both mass and peak shape (step 161 OD), and the method proceeds to step 1610E. 
15 At step 1610E, it is determined whether the MS instrument system used is of a high 

enough resolution to allow for mass spectral peak identification. If so, then the method proceeds 
to step 1610H for mass spectral peak identification. Otherwise, the method proceeds to step 
1610F for direct comparison of fiiU mass spectral data withou explicit peak identification. 

At step 161 OH, a mass spectral peak quantification and accurate mass determination step 
20 is performed. At step 1610F, a quantitative analysis is performed via multivariate calibration or 
a quahtative analysis is performed via pattern recognition/cluster analysis using the fiiU mass 
spectral response curve as inputs without explicit mass spectral peak identification. 
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Although the illustrative embodiments have been described herein with reference to the 
accompanying drawings, it is to be understood that the present invention is not limited to those 
precise embodiments, and that various other changes and modifications may be affected therein 
by one of ordinary skill in the related art without departing from the scope or spirit of the 
5 invention. All such changes and modifications are intended to be included within the scope of 
the invention as defined by the appended claims. 
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